Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a […]
( 14
min )
According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it […]
( 7
min )
A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users. In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research […]
( 11
min )
Posted by Sagar M. Waghmare, Senior Software Engineer, and Kimberly Wilber, Software Engineer, Google Research, Perception Team
As most people navigate their everyday world, they process visual input from the environment using an eye-level perspective. Unlike robots and self-driving cars, people don't have any "out-of-body" sensors to help guide them. Instead, a person’s sensory input is completely "egocentric", or "from the self." This also applies to new technologies that understand the world around us from a human-like perspective, e.g., robots navigating through unknown buildings, AR glasses that highlight objects, or assistive technology to help people run independently.
In computer vision, scene understanding is the subfield that studies how visible objects relate to the sce…
( 93
min )
We use the maximum a posteriori estimation principle for learning
representations distributed on the unit sphere. We propose to use the angular
Gaussian distribution, which corresponds to a Gaussian projected on the
unit-sphere and derive the associated loss function. We also consider the von
Mises-Fisher distribution, which is the conditional of a Gaussian in the
unit-sphere. The learned representations are pushed toward fixed directions,
which are the prior means of the Gaussians; allowing for a learning strategy
that is resilient to data drift. This makes it suitable for online continual
learning, which is the problem of training neural networks on a continuous data
stream, where multiple classification tasks are presented sequentially so that
data from past tasks are no longer accessible, and data from the current task
can be seen only once. To address this challenging scenario, we propose a
memory-based representation learning technique equipped with our new loss
functions. Our approach does not require negative data or knowledge of task
boundaries and performs well with smaller batch sizes while being
computationally efficient. We demonstrate with extensive experiments that the
proposed method outperforms the current state-of-the-art methods on both
standard evaluation scenarios and realistic scenarios with blurry task
boundaries. For reproducibility, we use the same training pipeline for every
compared method and share the code at https://t.ly/SQTj.
( 3
min )
Markov Decision Processes (MDPs) are a formal framework for modeling and
solving sequential decision-making problems. In finite-time horizons such
problems are relevant for instance for optimal stopping or specific supply
chain problems, but also in the training of large language models. In contrast
to infinite horizon MDPs optimal policies are not stationary, policies must be
learned for every single epoch. In practice all parameters are often trained
simultaneously, ignoring the inherent structure suggested by dynamic
programming. This paper introduces a combination of dynamic programming and
policy gradient called dynamic policy gradient, where the parameters are
trained backwards in time. For the tabular softmax parametrisation we carry out
the convergence analysis for simultaneous and dynamic policy gradient towards
global optima, both in the exact and sampled gradient settings without
regularisation. It turns out that the use of dynamic policy gradient training
much better exploits the structure of finite-time problems which is reflected
in improved convergence bounds.
( 2
min )
Detecting and discovering new gene interactions based on known gene
expressions and gene interaction data presents a significant challenge. Various
statistical and deep learning methods have attempted to tackle this challenge
by leveraging the topological structure of gene interactions and gene
expression patterns to predict novel gene interactions. In contrast, some
approaches have focused exclusively on utilizing gene expression profiles. In
this context, we introduce GENER, a parallel-layer deep learning network
designed exclusively for the identification of gene-gene relationships using
gene expression data. We conducted two training experiments and compared the
performance of our network with that of existing statistical and deep learning
approaches. Notably, our model achieved an average AUROC score of 0.834 on the
combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting
gene-gene interactions.
( 2
min )
We propose a new gradient descent algorithm with added stochastic terms for
finding the global optimizers of nonconvex optimization problems. A key
component in the algorithm is the adaptive tuning of the randomness based on
the value of the objective function. In the language of simulated annealing,
the temperature is state-dependent. With this, we prove the global convergence
of the algorithm with an algebraic rate both in probability and in the
parameter space. This is a significant improvement over the classical rate from
using a more straightforward control of the noise term. The convergence proof
is based on the actual discrete setup of the algorithm, not just its continuous
limit as often done in the literature. We also present several numerical
examples to demonstrate the efficiency and robustness of the algorithm for
reasonably complex objective functions.
( 2
min )
An extension of Transformers is proposed that enables explicit relational
reasoning through a novel module called the Abstractor. At the core of the
Abstractor is a variant of attention called relational cross-attention. The
approach is motivated by an architectural inductive bias for relational
learning that disentangles relational information from extraneous features
about individual objects. This enables explicit relational reasoning,
supporting abstraction and generalization from limited data. The Abstractor is
first evaluated on simple discriminative relational tasks and compared to
existing relational architectures. Next, the Abstractor is evaluated on purely
relational sequence-to-sequence tasks, where dramatic improvements are seen in
sample efficiency compared to standard Transformers. Finally, Abstractors are
evaluated on a collection of tasks based on mathematical problem solving, where
modest but consistent improvements in performance and sample efficiency are
observed.
( 2
min )
In this paper, we introduce a novel class of graphical models for
representing time lag specific causal relationships and independencies of
multivariate time series with unobserved confounders. We completely
characterize these graphs and show that they constitute proper subsets of the
currently employed model classes. As we show, from the novel graphs one can
thus draw stronger causal inferences -- without additional assumptions. We
further introduce a graphical representation of Markov equivalence classes of
the novel graphs. This graphical representation contains more causal knowledge
than what current state-of-the-art causal discovery algorithms learn.
( 2
min )
We propose a graphical structure for structural equation models that is
stable under marginalization under linearity and Gaussianity assumptions. We
show that computing the maximum likelihood estimation of this model is
equivalent to training a neural network. We implement a GPU-based algorithm
that computes the maximum likelihood estimation of these models.
( 2
min )
Neural networks have shown remarkable performance in computer vision, but
their deployment in numerous scientific and technical fields is challenging due
to their black-box nature. Scientists and practitioners need to evaluate the
reliability of a decision, i.e., to know simultaneously if a model relies on
the relevant features and whether these features are robust to image
corruptions. Existing attribution methods aim to provide human-understandable
explanations by highlighting important regions in the image domain, but fail to
fully characterize a decision process's reliability. To bridge this gap, we
introduce the Wavelet sCale Attribution Method (WCAM), a generalization of
attribution from the pixel domain to the space-scale domain using wavelet
transforms. Attribution in the wavelet domain reveals where {\it and} on what
scales the model focuses, thus enabling us to assess whether a decision is
reliable.
( 3
min )
In this paper, we introduce a novel class of graphical models for
representing time lag specific causal relationships and independencies of
multivariate time series with unobserved confounders. We completely
characterize these graphs and show that they constitute proper subsets of the
currently employed model classes. As we show, from the novel graphs one can
thus draw stronger causal inferences -- without additional assumptions. We
further introduce a graphical representation of Markov equivalence classes of
the novel graphs. This graphical representation contains more causal knowledge
than what current state-of-the-art causal discovery algorithms learn.
( 2
min )
We identify hidden layers inside a DNN with group actions on the data space,
and formulate the DNN as a dual voice transform with respect to Koopman
operator, a linear representation of the group action. Based on the group
theoretic arguments, particularly by using Schur's lemma, we show a simple
proof of the universality of those DNNs.
( 2
min )
We study the problem of training a flow-based generative model, parametrized
by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture.
We provide a sharp end-to-end analysis of the problem. First, we provide a
tight closed-form characterization of the learnt velocity field, when
parametrized by a shallow denoising auto-encoder trained on a finite number $n$
of samples from the target distribution. Building on this analysis, we provide
a sharp description of the corresponding generative flow, which pushes the base
Gaussian density forward to an approximation of the target density. In
particular, we provide closed-form formulae for the distance between the mean
of the generated mixture and the mean of the target mixture, which we show
decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact
Bayes-optimal.
( 2
min )
When artificial neural networks have demonstrated exceptional practical
success in a variety of domains, investigations into their theoretical
characteristics, such as their approximation power, statistical properties, and
generalization performance, have concurrently made significant strides. In this
paper, we construct a novel theory for understanding the effectiveness of
neural networks, which offers a perspective distinct from prior research.
Specifically, we explore the rationale underlying a common practice during the
construction of neural network models: sample splitting. Our findings indicate
that the optimal hyperparameters derived from sample splitting can enable a
neural network model that asymptotically minimizes the prediction risk. We
conduct extensive experiments across different application scenarios and
network architectures, and the results manifest our theory's effectiveness.
( 2
min )
Markov Decision Processes (MDPs) are a formal framework for modeling and
solving sequential decision-making problems. In finite-time horizons such
problems are relevant for instance for optimal stopping or specific supply
chain problems, but also in the training of large language models. In contrast
to infinite horizon MDPs optimal policies are not stationary, policies must be
learned for every single epoch. In practice all parameters are often trained
simultaneously, ignoring the inherent structure suggested by dynamic
programming. This paper introduces a combination of dynamic programming and
policy gradient called dynamic policy gradient, where the parameters are
trained backwards in time. For the tabular softmax parametrisation we carry out
the convergence analysis for simultaneous and dynamic policy gradient towards
global optima, both in the exact and sampled gradient settings without
regularisation. It turns out that the use of dynamic policy gradient training
much better exploits the structure of finite-time problems which is reflected
in improved convergence bounds.
( 2
min )
We propose conditional flows of the maximum mean discrepancy (MMD) with the
negative distance kernel for posterior sampling and conditional generative
modeling. This MMD, which is also known as energy distance, has several
advantageous properties like efficient computation via slicing and sorting. We
approximate the joint distribution of the ground truth and the observations
using discrete Wasserstein gradient flows and establish an error bound for the
posterior distributions. Further, we prove that our particle flow is indeed a
Wasserstein gradient flow of an appropriate functional. The power of our method
is demonstrated by numerical examples including conditional image generation
and inverse problems like superresolution, inpainting and computed tomography
in low-dose and limited-angle settings.
( 2
min )
In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.
( 13
min )
In this post, we provide an overview of popular multimodality models. We also demonstrate how to deploy these pre-trained models on Amazon SageMaker. Furthermore, we discuss the diverse applications of these models, focusing particularly on several real-world scenarios, such as zero-shot tag and attribution generation for ecommerce and automatic prompt generation from images.
( 13
min )
A research team is aiming to shake up the status quo for earthquake models. Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting. Dubbed RECAST, the model can use larger datasets and Read article >
( 6
min )
A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.
The post Efficient and hardware-friendly neural architecture search with SpaceEvo appeared first on Microsoft Research.
( 10
min )
In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.
( 8
min )
Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]
( 7
min )
Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.
( 11
min )
HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks.
The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.
( 10
min )
Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research.
The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.
( 29
min )
Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >
( 7
min )
October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >
( 9
min )
For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >
( 6
min )
This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year!
The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.
( 7
min )
Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]
( 11
min )
Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]
( 9
min )
Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]
( 7
min )
One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.
( 20
min )
In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries
The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.
( 21
min )
I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford. Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to?
The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.
( 19
min )
Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.
( 8
min )
A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities.
The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.
( 10
min )
With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >
( 6
min )
Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.
( 8
min )
No content preview
( 1
min )
Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]
( 11
min )
A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]
( 10
min )
In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]
( 9
min )
This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]
( 9
min )
By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
We introduce RACH-Space, a novel classification method in ensemble learning.
In particular, we show its applicability as a label model for weakly supervised
learning. RACH-Space offers simplicity in implementation with minimal
assumptions on the data or weak signals. The model is well suited for scenarios
where fully labeled data is not available. Our method is built upon geometrical
interpretation of the space spanned by weak signals. Our analysis of the high
dimensional convex hull structure underlying general set of weak signals
bridges geometry with machine learning. Empirical results also demonstrate that
RACH-Space works well in practice and compares favorably to best existing label
models for weakly supervised learning.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
Property prediction plays an important role in material discovery. As an
initial step to eventually develop a foundation model for material science, we
introduce a new autoencoder called the MHG-GNN, which combines graph neural
network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of
property prediction tasks with diverse materials show that MHG-GNN is
promising.
( 2
min )
Consistency regularization-based methods are prevalent in semi-supervised
learning (SSL) algorithms due to their exceptional performance. However, they
mainly depend on domain-specific data augmentations, which are not usable in
domains where data augmentations are less practicable. On the other hand,
Pseudo-labeling (PL) is a general and domain-agnostic SSL approach that, unlike
consistency regularization-based methods, does not rely on the domain. PL
underperforms due to the erroneous high-confidence predictions from poorly
calibrated models. This paper proposes an uncertainty-aware pseudo-label
selection framework that employs uncertainty sets yielded by the conformal
regularization algorithm to fix the poor calibration neural networks, reducing
noisy training data. The codes of this work are available at:
https://github.com/matinmoezzi/ups conformal classification
( 2
min )
Adversarial Machine Learning (AML) is a rapidly growing field of security
research, with an often overlooked area being model attacks through
side-channels. Previous works show such attacks to be serious threats, though
little progress has been made on efficient remediation strategies that avoid
costly model re-engineering. This work demonstrates a new defense against AML
side-channel attacks using model compilation techniques, namely tensor
optimization. We show relative model attack effectiveness decreases of up to
43% using tensor optimization, discuss the implications, and direction of
future work.
( 2
min )
In this paper, we introduce two types of novel Asymptotic-Preserving
Convolutional Deep Operator Networks (APCONs) designed to address the
multiscale time-dependent linear transport problem. We observe that the vanilla
physics-informed DeepONets with modified MLP may exhibit instability in
maintaining the desired limiting macroscopic behavior. Therefore, this
necessitates the utilization of an asymptotic-preserving loss function. Drawing
inspiration from the heat kernel in the diffusion equation, we propose a new
architecture called Convolutional Deep Operator Networks, which employ multiple
local convolution operations instead of a global heat kernel, along with
pooling and activation operations in each filter layer. Our APCON methods
possess a parameter count that is independent of the grid size and are capable
of capturing the diffusive behavior of the linear transport problem. Finally,
we validate the effectiveness of our methods through several numerical
examples.
( 2
min )
This paper studies the problem of learning the large-scale Gaussian graphical
models that are multivariate totally positive of order two ($\text{MTP}_2$). By
introducing the concept of bridge, which commonly exists in large-scale sparse
graphs, we show that the entire problem can be equivalently optimized through
(1) several smaller-scaled sub-problems induced by a \emph{bridge-block
decomposition} on the thresholded sample covariance graph and (2) a set of
explicit solutions on entries corresponding to \emph{bridges}. From practical
aspect, this simple and provable discipline can be applied to break down a
large problem into small tractable ones, leading to enormous reduction on the
computational complexity and substantial improvements for all existing
algorithms. The synthetic and real-world experiments demonstrate that our
proposed method presents a significant speed-up compared to the
state-of-the-art benchmarks.
( 2
min )
Inferring biological relationships from cellular phenotypes in high-content
microscopy screens provides significant opportunity and challenge in biological
research. Prior results have shown that deep vision models can capture
biological signal better than hand-crafted features. This work explores how
weakly supervised and self-supervised deep learning approaches scale when
training larger models on larger datasets. Our results show that both CNN- and
ViT-based masked autoencoders significantly outperform weakly supervised
models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion
unique crops sampled from 95-million microscopy images achieves relative
improvements as high as 28% over our best weakly supervised models at inferring
known biological relationships curated from public databases.
( 2
min )
We prove a fundamental limitation on the efficiency of a wide class of
Reinforcement Learning (RL) algorithms. This limitation applies to model-free
RL methods as well as a broad range of model-based methods, such as planning
with tree search.
Under an abstract definition of this class, we provide a family of RL
problems for which these methods suffer a lower bound exponential in the
horizon for their interactions with the environment to find an optimal
behavior. However, there exists a method, not tailored to this specific family
of problems, which can efficiently solve the problems in the family.
In contrast, our limitation does not apply to several types of methods
proposed in the literature, for instance, goal-conditioned methods or other
algorithms that construct an inverse dynamics model.
( 2
min )
This work reports the empirical performance of an automated medical landmark
detection method for predict clinical markers in hip radiograph images.
Notably, the detection method was trained using a label-only augmentation
scheme; our results indicate that this form of augmentation outperforms
traditional data augmentation and produces highly sample efficient estimators.
We train a generic U-Net-based architecture under a curriculum consisting of
two phases: initially relaxing the landmarking task by enlarging the label
points to regions, then gradually eroding these label regions back to the base
task. We measure the benefits of this approach on six datasets of radiographs
with gold-standard expert annotations.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
We propose a novel framework that combines deep generative time series models
with decision theory for generating personalized treatment strategies. It
leverages historical patient trajectory data to jointly learn the generation of
realistic personalized treatment and future outcome trajectories through deep
generative time series models. In particular, our framework enables the
generation of novel multivariate treatment strategies tailored to the
personalized patient history and trained for optimal expected future outcomes
based on conditional expected utility maximization. We demonstrate our
framework by generating personalized insulin treatment strategies and blood
glucose predictions for hospitalized diabetes patients, showcasing the
potential of our approach for generating improved personalized treatment
strategies. Keywords: deep generative model, probabilistic decision support,
personalized treatment generation, insulin and blood glucose prediction
( 2
min )
In this analysis, we use a K-nearest neighbors (KNN) model to conduct crop segmentation, and we compare these results with ground truth imagery on an agricultural region. Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. These results are a testament to the power of Planet’s high-cadence geospatial imagery. Agricultural fields change often, sometimes multiple times a season, and having high-frequency satellite imagery available to observe and analyze this land can provide immense value to our understanding of agricultural land and quickly-changing environments.
( 15
min )
In a talk, now available online, NVIDIA Chief Scientist Bill Dally describes a tectonic shift in how computer performance gets delivered in a post-Moore’s law era. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems Read article >
( 6
min )
This post is co-written with Ilan Geller and Shuyu Yang from Accenture. Enterprises today face major challenges when it comes to using their information and knowledge bases for both internal and external business operations. With constantly evolving operations, processes, policies, and compliance requirements, it can be extremely difficult for employees and customers to stay up […]
( 8
min )
We’re excited to announce that Amazon SageMaker Canvas now offers a quicker and more user-friendly way to create machine learning models for time-series forecasting. SageMaker Canvas is a visual point-and-click service that enables business analysts to generate accurate machine learning (ML) models without requiring any machine learning experience or having to write a single line of code. SageMaker […]
( 7
min )
In the world of data-driven decision-making, time series forecasting is key in enabling businesses to use historical data patterns to anticipate future outcomes. Whether you are working in asset risk management, trading, weather prediction, energy demand forecasting, vital sign monitoring, or traffic analysis, the ability to forecast accurately is crucial for success. In these applications, […]
( 10
min )
In the rapidly evolving world of AI and machine learning (ML), foundation models (FMs) have shown tremendous potential for driving innovation and unlocking new use cases. However, as organizations increasingly harness the power of FMs, concerns surrounding data privacy, security, added cost, and compliance have become paramount. Regulated and compliance-oriented industries, such as financial services, […]
( 13
min )
Companies use time series forecasting to make core planning decisions that help them navigate through uncertain futures. This post is meant to address supply chain stakeholders, who share a common need of determining how many finished goods are needed over a mixed variety of planning time horizons. In addition to planning how many units of […]
( 11
min )
From startups to enterprises, organizations of all sizes are getting started with generative AI. They want to capitalize on generative AI and translate the momentum from betas, prototypes, and demos into real-world productivity gains and innovations. But what do organizations need to bring generative AI into the enterprise and make it real? When we talk […]
( 13
min )
The wait is over. GeForce NOW Ultimate members can experience Cyberpunk 2077: Phantom Liberty on GOG.com at full GeForce RTX 4080 quality, with support for NVIDIA DLSS 3.5 technology. It’s part of an action-packed GFN Thursday, with 26 more games joining the cloud gaming platform’s library, including Quake II from id Software. A New Look Read article >
( 8
min )
Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come. In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the […]
The post AI Frontiers: Measuring and mitigating harms with Hanna Wallach appeared first on Microsoft Research.
( 29
min )
The iconic sci-fi opera “VALIS,” first composed by Professor Tod Machover in 1987, reboots at MIT for a new generation.
( 11
min )
Inspired by physics, a new generative model PFGM++ outperforms diffusion models in image generation.
( 10
min )
The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline […]
( 13
min )
MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to […]
( 5
min )
DENZA, the luxury electric-vehicle brand and joint venture between BYD and Mercedes-Benz, is debuting new intelligent driving features for its entire N7 model lineup, powered by the NVIDIA DRIVE Orin system-on-a-chip (SoC). The N7 series was introduced earlier this year as a family of spacious five-seater SUVs for commuters looking to sport a deluxe EV Read article >
( 5
min )
Medical-device company Invenio Imaging is developing technology that enables surgeons to evaluate tissue biopsies in the operating room, immediately after samples are collected — providing in just three minutes AI-accelerated insights that would otherwise take weeks to obtain from a pathology lab. In a surgical biopsy, a medical professional removes samples of cells or tissue Read article >
( 6
min )
As generative AI sweeps across corporate boardrooms around the world, global telecommunications companies are exploring how to cost-effectively deliver many of these new AI applications to the edge over 5G and upcoming 6G networks. Telcos plan to deploy over 17 million 5G microcells and towers worldwide by 2025. Building, managing and optimizing this new infrastructure Read article >
( 6
min )
Chunked prefills & decode-maximal batching boost LLM inference; DragNUWA combines text, image, and trajectory for fine-grained video content control; reconstructing images from human brain signals; structural inequalities in creator-audience relationships.
The post Research Focus: Week of September 25, 2023 appeared first on Microsoft Research.
( 9
min )
Talk about a Grand Slam. Denny’s CEO Kelli Valade was joined Tuesday by NVIDIA CEO Jensen Huang to unveil a plaque at the Silicon Valley Denny’s where NVIDIA’s founders hatched their idea for a chip that would enable realistic 3D graphics on personal computers. “This is a place where we fuel ideas. Your story is Read article >
( 6
min )
From gaming to creating to everyday productivity, NVIDIA RTX graphics cards feature specialized Tensor Cores that deliver cutting-edge performance and transformative capabilities for AI.
( 7
min )
As machine learning (ML) goes mainstream and gains wider adoption, ML-powered inference applications are becoming increasingly common to solve a range of complex business problems. The solution to these complex business problems often requires using multiple ML models and steps. This post shows you how to build and host an ML application with custom containers […]
( 13
min )
This post was co-authored with Daniele Chiappalupi, participant of the AWS student Hackathon team at ETH Zürich. Everyone can easily get started with machine learning (ML) using Amazon SageMaker JumpStart. In this post, we show you how a university Hackathon team used SageMaker JumpStart to quickly build an application that helps users identify and remove […]
( 9
min )
We’re at an exciting inflection point in the widespread adoption of machine learning (ML), and we believe most customer experiences and applications will be reinvented with generative AI. Generative AI can create new content and ideas, including conversations, stories, images, videos, and music. Like most AI, generative AI is powered by ML models—very large models […]
( 12
min )
E-commerce has improved technology and convenience for consumers globally. Fraud is a problem in e-commerce. Merchants and platforms fight fraud to protect their businesses and customers. Anomaly detection is a powerful tool for identifying irregular patterns and potential fraud. This article explores how anomaly detection is used in fraud detection for e-commerce and discusses different… Read More »In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches?
The post In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches? appeared first on Data Science Central.
( 22
min )
Thanks to the internet, you can now easily expand your reach and engage with diverse audiences wherever they are. However, this opportunity raises an important question: how can you localize your web content and maintain the security and privacy of sensitive data? This article comprehensively explores the best practices that will help you maintain data… Read More »The essential guide on data security and privacy in web localization
The post The essential guide on data security and privacy in web localization appeared first on Data Science Central.
( 22
min )
Microsoft researchers are introducing AutoGen, a framework for simplifying the orchestration, optimization, and automation of workflows for large language model (LLM) applications—potentially transforming and extending what LLMs can do.
The post AutoGen: Enabling next-generation large language model applications appeared first on Microsoft Research.
( 10
min )
No content preview
( 1
min )
ChatGPT, Bard, GPT-4, and the like are often pitched as ways to retrieve information. The problem is they'll "retrieve" whatever you ask for, whether or not it exists.
Tumblr user @indigofoxpaws sent me a few screenshots where they'd asked ChatGPT for an explanation of
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Adversarial examples, deliberately crafted using small perturbations to fool
deep neural networks, were first studied in image processing and more recently
in NLP. While approaches to detecting adversarial examples in NLP have largely
relied on search over input perturbations, image processing has seen a range of
techniques that aim to characterise adversarial subspaces over the learned
representations.
In this paper, we adapt two such approaches to NLP, one based on nearest
neighbors and influence functions and one on Mahalanobis distances. The former
in particular produces a state-of-the-art detector when compared against
several strong baselines; moreover, the novel use of influence functions
provides insight into how the nature of adversarial example subspaces in NLP
relate to those in image processing, and also how they differ depending on the
kind of NLP task.
( 2
min )
Rapid and accurate identification of Venous thromboembolism (VTE), a severe
cardiovascular condition including deep vein thrombosis (DVT) and pulmonary
embolism (PE), is important for effective treatment. Leveraging Natural
Language Processing (NLP) on radiology reports, automated methods have shown
promising advancements in identifying VTE events from retrospective data
cohorts or aiding clinical experts in identifying VTE events from radiology
reports. However, effectively training Deep Learning (DL) and the NLP models is
challenging due to limited labeled medical text data, the complexity and
heterogeneity of radiology reports, and data imbalance. This study proposes
novel method combinations of DL methods, along with data augmentation, adaptive
pre-trained NLP model selection, and a clinical expert NLP rule-based
classifier, to improve the accuracy of VTE identification in unstructured
(free-text) radiology reports. Our experimental results demonstrate the model's
efficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predicting
DVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE.
These findings emphasize the model's robustness and its potential to
significantly contribute to VTE research.
( 2
min )
Action scene understanding in soccer is a challenging task due to the complex
and dynamic nature of the game, as well as the interactions between players.
This article provides a comprehensive overview of this task divided into action
recognition, spotting, and spatio-temporal action localization, with a
particular emphasis on the modalities used and multimodal methods. We explore
the publicly available data sources and metrics used to evaluate models'
performance. The article reviews recent state-of-the-art methods that leverage
deep learning techniques and traditional methods. We focus on multimodal
methods, which integrate information from multiple sources, such as video and
audio data, and also those that represent one source in various ways. The
advantages and limitations of methods are discussed, along with their potential
for improving the accuracy and robustness of models. Finally, the article
highlights some of the open research questions and future directions in the
field of soccer action recognition, including the potential for multimodal
methods to advance this field. Overall, this survey provides a valuable
resource for researchers interested in the field of action scene understanding
in soccer.
( 2
min )
This paper presents a Hierarchical Reinforcement Learning methodology
tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO).
Incorporating a high-level policy for global task distribution and a low-level
policy for real-time adaptations as a safety mechanism, our approach integrates
the Similarity Attention-based Encoder (SABE) for task prioritization and an
MLP estimator for energy consumption forecasting. Integrating this mechanism
creates a safe and fault-tolerant system for CubeSat task scheduling.
Simulation results validate the Hierarchical Reinforcement Learning superior
convergence and task success rate, outperforming both the MADDPG model and
traditional random scheduling across multiple CubeSat configurations.
( 2
min )
A common formulation of constrained reinforcement learning involves multiple
rewards that must individually accumulate to given thresholds. In this class of
problems, we show a simple example in which the desired optimal policy cannot
be induced by any weighted linear combination of rewards. Hence, there exist
constrained reinforcement learning problems for which neither regularized nor
classical primal-dual methods yield optimal policies. This work addresses this
shortcoming by augmenting the state with Lagrange multipliers and
reinterpreting primal-dual methods as the portion of the dynamics that drives
the multipliers evolution. This approach provides a systematic state
augmentation procedure that is guaranteed to solve reinforcement learning
problems with constraints. Thus, as we illustrate by an example, while previous
methods can fail at finding optimal policies, running the dual dynamics while
executing the augmented policy yields an algorithm that provably samples
actions from the optimal policy.
( 2
min )
Transformer has been considered the dominating neural architecture in NLP and
CV, mostly under supervised settings. Recently, a similar surge of using
Transformers has appeared in the domain of reinforcement learning (RL), but it
is faced with unique design choices and challenges brought by the nature of RL.
However, the evolution of Transformers in RL has not yet been well unraveled.
In this paper, we seek to systematically review motivations and progress on
using Transformers in RL, provide a taxonomy on existing works, discuss each
sub-field, and summarize future prospects.
( 2
min )
We formulate a data independent latent space regularisation constraint for
general unsupervised autoencoders. The regularisation rests on sampling the
autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre
quadrature. Revisiting this classic enables to prove that regularised
autoencoders ensure a one-to-one re-embedding of the initial data manifold to
its latent representation. Demonstrations show that prior proposed
regularisation strategies, such as contractive autoencoding, cause topological
defects already for simple examples, and so do convolutional based
(variational) autoencoders. In contrast, topological preservation is ensured
already by standard multilayer perceptron neural networks when being
regularised due to our contribution. This observation extends through the
classic FashionMNIST dataset up to real world encoding problems for MRI brain
scans, suggesting that, across disciplines, reliable low dimensional
representations of complex high-dimensional datasets can be delivered due to
this regularisation technique.
( 2
min )
Indoor localization is getting increasing demands for various cutting-edged
technologies, like Virtual/Augmented reality and smart home. Traditional
model-based localization suffers from significant computational overhead, so
fingerprint localization is getting increasing attention, which needs lower
computation cost after the fingerprint database is built. However, the accuracy
of indoor localization is limited by the complicated indoor environment which
brings the multipath signal refraction. In this paper, we provided a scheme to
improve the accuracy of indoor fingerprint localization from the frequency
domain by predicting the channel state information (CSI) values from another
transmitting channel and spliced the multi-band information together to get
more precise localization results. We tested our proposed scheme on COST 2100
simulation data and real time orthogonal frequency division multiplexing (OFDM)
WiFi data collected from an office scenario.
( 2
min )
Consider an online convex optimization problem where the loss functions are
self-concordant barriers, smooth relative to a convex function $h$, and
possibly non-Lipschitz. We analyze the regret of online mirror descent with
$h$. Then, based on the result, we prove the following in a unified manner.
Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online
portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of
exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3}
)$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4}
d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio
selection, the regret of online mirror descent with the logarithmic barrier is
$\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due
to Orseau et al. up to logarithmic terms. 3. For online learning quantum states
with the logarithmic loss, the regret of online mirror descent with the
log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration
time is shorter than all existing algorithms we know.
( 3
min )
We study the problem of in-context learning (ICL) with large language models
(LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak
or regurgitate the private examples demonstrated in the prompt. We propose a
novel algorithm that generates synthetic few-shot demonstrations from the
private dataset with formal differential privacy (DP) guarantees, and show
empirically that it can achieve effective ICL. We conduct extensive experiments
on standard benchmarks and compare our algorithm with non-private ICL and
zero-shot solutions. Our results demonstrate that our algorithm can achieve
competitive performance with strong privacy levels. These results open up new
possibilities for ICL with privacy protection for a broad range of
applications.
( 2
min )
Initialization of neural network weights plays a pivotal role in determining
their performance. Feature Imitating Networks (FINs) offer a novel strategy by
initializing weights to approximate specific closed-form statistical features,
setting a promising foundation for deep learning architectures. While the
applicability of FINs has been chiefly tested in biomedical domains, this study
extends its exploration into other time series datasets. Three different
experiments are conducted in this study to test the applicability of imitating
Tsallis entropy for performance enhancement: Bitcoin price prediction, speech
emotion recognition, and chronic neck pain detection. For the Bitcoin price
prediction, models embedded with FINs reduced the root mean square error by
around 1000 compared to the baseline. In the speech emotion recognition task,
the FIN-augmented model increased classification accuracy by over 3 percent.
Lastly, in the CNP detection experiment, an improvement of about 7 percent was
observed compared to established classifiers. These findings validate the broad
utility and potency of FINs in diverse applications.
( 2
min )
The decision-making process in real-world implementations has been affected
by a growing reliance on data-driven models. We investigated the synergetic
pattern between the data-driven methods, empirical domain knowledge, and
first-principles simulations. We showed the potential risk of biased results
when using data-driven models without causal analysis. Using a case study
assessing the implication of several design solutions on the energy consumption
of a building, we proved the necessity of causal analysis during the
data-driven modeling process. We concluded that: (a) Data-driven models'
accuracy assessment or domain knowledge screening may not rule out biased and
spurious results; (b) Data-driven models' feature selection should involve
careful consideration of causal relationships, especially colliders; (c) Causal
analysis results can be used as an aid to first-principles simulation design
and parameter checking to avoid cognitive biases. We proved the benefits of
causal analysis when applied to data-driven models in building engineering.
( 2
min )
This work describes the TrueLearn Python library, which contains a family of
online learning Bayesian models for building educational (or more generally,
informational) recommendation systems. This family of models was designed
following the "open learner" concept, using humanly-intuitive user
representations. For the sake of interpretability and putting the user in
control, the TrueLearn library also contains different representations to help
end-users visualise the learner models, which may in the future facilitate user
interaction with their own models. Together with the library, we include a
previously publicly released implicit feedback educational dataset with
evaluation metrics to measure the performance of the models. The extensive
documentation and coding examples make the library highly accessible to both
machine learning developers and educational data mining and learning analytic
practitioners. The library and the support documentation with examples are
available at https://truelearn.readthedocs.io/en/latest.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
In this paper, we study the effect of popularity degradation bias in the
context of local music recommendations. Specifically, we examine how accurate
two top-performing recommendation algorithms, Weight Relevance Matrix
Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at
recommending artists as a function of artist popularity. We find that both
algorithms improve recommendation performance for more popular artists and, as
such, exhibit popularity degradation bias. While both algorithms produce a
similar level of performance for more popular artists, Mult-VAE shows better
relative performance for less popular artists. This suggests that this
algorithm should be preferred for local (long-tail) music artist
recommendation.
( 2
min )
Social science often relies on surveys of households and individuals. Dozens
of such surveys are regularly administered by the U.S. government. However,
they field independent, unconnected samples with specialized questions,
limiting research questions to those that can be answered by a single survey.
The fusionACS project seeks to integrate data from multiple U.S. household
surveys by statistically "fusing" variables from "donor" surveys onto American
Community Survey (ACS) microdata. This results in an integrated microdataset of
household attributes and well-being dimensions that can be analyzed to address
research questions in ways that are not currently possible. The presented data
comprise the fusion onto the ACS of select donor variables from the Residential
Energy Consumption Survey (RECS) of 2015, the National Household Transportation
Survey (NHTS) of 2017, the American Housing Survey (AHS) of 2019, and the
Consumer Expenditure Survey - Interview (CEI) for the years 2015-2019. The
underlying statistical techniques are included in an open-source $R$ package,
fusionModel, that provides generic tools for the creation, analysis, and
validation of fused microdata.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
Simple regret minimization is a critical problem in learning optimal
treatment assignment policies across various domains, including healthcare and
e-commerce. However, it remains understudied in the contextual bandit setting.
We propose a new family of computationally efficient bandit algorithms for the
stochastic contextual bandit settings, with the flexibility to be adapted for
cumulative regret minimization (with near-optimal minimax guarantees) and
simple regret minimization (with SOTA guarantees). Furthermore, our algorithms
adapt to model misspecification and extend to the continuous arm settings.
These advantages come from constructing and relying on "conformal arm sets"
(CASs), which provide a set of arms at every context that encompass the
context-specific optimal arm with some probability across the context
distribution. Our positive results on simple and cumulative regret guarantees
are contrasted by a negative result, which shows that an algorithm can't
achieve instance-dependent simple regret guarantees while simultaneously
achieving minimax optimal cumulative regret guarantees.
( 2
min )
Initialization of neural network weights plays a pivotal role in determining
their performance. Feature Imitating Networks (FINs) offer a novel strategy by
initializing weights to approximate specific closed-form statistical features,
setting a promising foundation for deep learning architectures. While the
applicability of FINs has been chiefly tested in biomedical domains, this study
extends its exploration into other time series datasets. Three different
experiments are conducted in this study to test the applicability of imitating
Tsallis entropy for performance enhancement: Bitcoin price prediction, speech
emotion recognition, and chronic neck pain detection. For the Bitcoin price
prediction, models embedded with FINs reduced the root mean square error by
around 1000 compared to the baseline. In the speech emotion recognition task,
the FIN-augmented model increased classification accuracy by over 3 percent.
Lastly, in the CNP detection experiment, an improvement of about 7 percent was
observed compared to established classifiers. These findings validate the broad
utility and potency of FINs in diverse applications.
( 2
min )
Discover the obstacles hindering seamless AI adoption in financial services and gain actionable insights to navigate regulatory compliance, data security, organizational change, and more.
The post AI in finance: Addressing hurdles on the path to transformation appeared first on Data Science Central.
( 22
min )
Posted by Cheng-Yu Hsieh, Student Researcher, and Chen-Yu Lee, Research Scientist, Cloud AI Team
Large language models (LLMs) have enabled a new data-efficient learning paradigm wherein they can be used to solve unseen new tasks via zero-shot or few-shot prompting. However, LLMs are challenging to deploy for real-world applications due to their sheer size. For instance, serving a single 175 billion LLM requires at least 350GB of GPU memory using specialized infrastructure, not to mention that today's state-of-the-art LLMs are composed of over 500 billion parameters. Such computational requirements are inaccessible for many research teams, especially for applications that require low latency performance.
To circumvent these deployment challenges, practitioners often choose to deplo…
( 93
min )
In this post, we discuss how United Airlines, in collaboration with the Amazon Machine Learning Solutions Lab, build an active learning framework on AWS to automate the processing of passenger documents. “In order to deliver the best flying experience for our passengers and make our internal business process as efficient as possible, we have developed […]
( 10
min )
To add to our guidance for optimizing deep learning workloads for sustainability on AWS, this post provides recommendations that are specific to generative AI workloads. In particular, we provide practical best practices for different customization scenarios, including training models from scratch, fine-tuning with additional data using full or parameter-efficient techniques, Retrieval Augmented Generation (RAG), and prompt engineering.
( 10
min )
The NVIDIA Studio laptop lineup is expanding with the new Microsoft Surface Laptop Studio 2, powered by GeForce RTX 4060, GeForce RTX 4050 or NVIDIA RTX 2000 Ada Generation Laptop GPUs, providing powerful performance and versatility for creators.
( 8
min )
Gone are the days when AI was the domain of sprawling data centers or elite researchers. For GeForce RTX users, AI is now running on your PC. It’s personal, enhancing every keystroke, every frame and every moment. Gamers are already enjoying the benefits of AI in over 300 RTX games. Meanwhile, content creators have access Read article >
( 8
min )
For seasoned 3D artists and budding digital creation enthusiasts alike, an alpha version of the popular 3D software Blender is elevating creative journeys.
( 7
min )
NVIDIA founder and CEO Jensen Huang will highlight the newest in generative AI and cloud computing at the NVIDIA AI Summit in Tel Aviv from Oct. 15-16. The two-day summit is set to attract more than 2,500 developers, researchers and decision-makers from across one of the world’s most vibrant technology hubs. With over 6,000 startups, Read article >
( 5
min )
Time to get the gang back together — PAYDAY 3 streams on GeForce NOW this week. It’s one of 11 titles joining the cloud this week, including Party Animals. The Perfect Heist PAYDAY 3 is the highly anticipated sequel to one of the world’s most popular co-op shooters. Step out of retirement and back into Read article >
( 5
min )
A visionary entrepreneur and innovator, Yoon will focus on entrepreneurship, supporting female engineers, and fostering inclusive innovation.
( 8
min )
Mercedes-Benz is using digital twins for production with help from NVIDIA Omniverse, a platform for developing Universal Scene Description (OpenUSD) applications to design, collaborate, plan and operate manufacturing and assembly facilities. Mercedes-Benz’s new production techniques will bring its next-generation vehicle portfolio into its manufacturing facilities operating in Rastatt, Germany; Kecskemét, Hungary; and Beijing, China — Read article >
( 6
min )
In this post, we demonstrate one of the many options that you have to take advantage of AWS’s broadest and deepest set of AI/ML capabilities in a multicloud environment. We show how you can build and train an ML model in AWS and deploy the model in another platform. We train the model using Amazon SageMaker, store the model artifacts in Amazon Simple Storage Service (Amazon S3), and deploy and run the model in Azure.
( 13
min )
With generative AI and large language models (LLMs) driving groundbreaking innovations, the computational demands for training and inference are skyrocketing. These modern-day generative AI applications demand full-stack accelerated compute, starting with state-of-the-art infrastructure that can handle massive workloads with speed and accuracy. To help meet this need, Oracle Cloud Infrastructure today announced general availability of Read article >
( 6
min )
Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse and OpenUSD to accelerate their 3D workflows and create virtual worlds. As a student at the Queensland University of Technology (QUT) in Australia, Emily Boehmer was torn between pursuing the creative arts Read article >
( 7
min )
The MIT and Accenture Convergence Initiative for Industry and Technology announces the 2023-24 graduate fellows.
( 9
min )
Inventions in medical imaging, aircrew scheduling, data security, and quantum networking are named among the year’s most innovative new products.
( 11
min )
Where do you start if you want to build a data analytics function from the ground up? As an analytics leader at a startup, you will need to make several important decisions early on to build an effective team. This article dives into four decision areas and highlights ways in which to think about them:… Read More »A guide to setting up analytics at a consumer tech startup
The post A guide to setting up analytics at a consumer tech startup appeared first on Data Science Central.
( 25
min )
Multi-modal data is a valuable component of the financial industry, encompassing market, economic, customer, news and social media, and risk data. Financial organizations generate, collect, and use this data to gain insights into financial operations, make better decisions, and improve performance. However, there are challenges associated with multi-modal data due to the complexity and lack […]
( 17
min )
This post is written in collaboration with Dima Zadorozhny and Fuad Babaev from VirtuSwap. VirtuSwap is a startup company developing innovative technology for decentralized exchange of assets on blockchains. VirtuSwap’s technology provides more efficient trading for assets that don’t have a direct pair between them. The absence of a direct pair leads to costly indirect trading, […]
( 9
min )
Amazon SageMaker Feature Store provides an end-to-end solution to automate feature engineering for machine learning (ML). For many ML use cases, raw data like log files, sensor readings, or transaction records need to be transformed into meaningful features that are optimized for model training. Feature quality is critical to ensure a highly accurate ML model. […]
( 12
min )
In the next decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. In line with Microsoft’s mission to empower every person and every organization on the planet […]
The post Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies appeared first on Microsoft Research.
( 15
min )
The 27 finalists — representing every school at MIT — will explore the technology’s impact on democracy, education, sustainability, communications, and much more.
( 10
min )
Researchers use multiple AI models to collaborate, debate, and improve their reasoning abilities to advance the performance of LLMs while increasing accountability and factual accuracy.
( 9
min )
Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. Although this enables parallelization of tasks across multiple nodes, leading to accelerated training times, enhanced scalability, and improved […]
( 13
min )
This post is co-authored with Richard Alexander and Mark Hallows from Arup. Arup is a global collective of designers, consultants, and experts dedicated to sustainable development. Data underpins Arup consultancy for clients with world-class collection and analysis providing insight to make an impact. The solution presented here is to direct decision-making processes for resilient city […]
( 9
min )
Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale — the company behind the fast growing open-source unified compute framework for scalable computing — announced today that it is bringing NVIDIA AI to Ray open source and the Read article >
( 7
min )
Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale — the company behind the fast growing open-source unified compute framework for scalable computing — announced today that it is bringing NVIDIA AI to Ray open source and the Read article >
( 7
min )
We define a family of $C^1$ functions which we call "nowhere coexpanding
functions" that is closed under composition and includes all $C^3$ functions
with non-positive Schwarzian derivative. We establish results on the number and
nature of the fixed points of these functions, including a generalisation of a
classic result of Singer.
( 2
min )
Feature generation aims to generate new and meaningful features to create a
discriminative representation space.A generated feature is meaningful when the
generated feature is from a feature pair with inherent feature interaction. In
the real world, experienced data scientists can identify potentially useful
feature-feature interactions, and generate meaningful dimensions from an
exponentially large search space, in an optimal crossing form over an optimal
generation path. But, machines have limited human-like abilities.We generalize
such learning tasks as self-optimizing feature generation. Self-optimizing
feature generation imposes several under-addressed challenges on existing
systems: meaningful, robust, and efficient generation. To tackle these
challenges, we propose a principled and generic representation-crossing
framework to solve self-optimizing feature generation.To achieve hashing
representation, we propose a three-step approach: feature discretization,
feature hashing, and descriptive summarization. To achieve reinforcement
crossing, we develop a hierarchical reinforcement feature crossing approach.We
present extensive experimental results to demonstrate the effectiveness and
efficiency of the proposed method. The code is available at
https://github.com/yingwangyang/HRC_feature_cross.git.
( 2
min )
Effectively leveraging multimodal information from social media posts is
essential to various downstream tasks such as sentiment analysis, sarcasm
detection and hate speech classification. However, combining text and image
information is challenging because of the idiosyncratic cross-modal semantics
with hidden or complementary information present in matching image-text pairs.
In this work, we aim to directly model this by proposing the use of two
auxiliary losses jointly with the main task when fine-tuning any pre-trained
multimodal model. Image-Text Contrastive (ITC) brings image-text
representations of a post closer together and separates them from different
posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates
the understanding of semantic correspondence between images and text by
penalizing unrelated pairs. We combine these objectives with five multimodal
models, demonstrating consistent improvements across four popular social media
datasets. Furthermore, through detailed analysis, we shed light on the specific
scenarios and cases where each auxiliary task proves to be most effective.
( 2
min )
Reasoning, as an essential ability for complex problem-solving, can provide
back-end support for various real-world applications, such as medical
diagnosis, negotiation, etc. This paper provides a comprehensive survey of
cutting-edge research on reasoning with language model prompting. We introduce
research works with comparisons and summaries and provide systematic resources
to help beginners. We also discuss the potential reasons for emerging such
reasoning abilities and highlight future research directions. Resources are
available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated
periodically).
( 2
min )
In this work, we provide a characterization of the feature-learning process
in two-layer ReLU networks trained by gradient descent on the logistic loss
following random initialization. We consider data with binary labels that are
generated by an XOR-like function of the input features. We permit a constant
fraction of the training labels to be corrupted by an adversary. We show that,
although linear classifiers are no better than random guessing for the
distribution we consider, two-layer ReLU networks trained by gradient descent
achieve generalization error close to the label noise rate. We develop a novel
proof technique that shows that at initialization, the vast majority of neurons
function as random features that are only weakly correlated with useful
features, and the gradient descent dynamics 'amplify' these weak, random
features to strong, useful features.
( 2
min )
The primary goal of this research is to propose a novel architecture for a
deep neural network that can solve fractional differential equations
accurately. A Gaussian integration rule and a $L_1$ discretization technique
are used in the proposed design. In each equation, a deep neural network is
used to approximate the unknown function. Three forms of fractional
differential equations have been examined to highlight the method's
versatility: a fractional ordinary differential equation, a fractional order
integrodifferential equation, and a fractional order partial differential
equation. The results show that the proposed architecture solves different
forms of fractional differential equations with excellent precision.
( 2
min )
We present a novel local-global feature fusion framework for body-weight
exercise recognition with floor-based dynamic pressure maps. One step further
from the existing studies using deep neural networks mainly focusing on global
feature extraction, the proposed framework aims to combine local and global
features using image processing techniques and the YOLO object detection to
localize pressure profiles from different body parts and consider physical
constraints. The proposed local feature extraction method generates two sets of
high-level local features consisting of cropped pressure mapping and numerical
features such as angular orientation, location on the mat, and pressure area.
In addition, we adopt a knowledge distillation for regularization to preserve
the knowledge of the global feature extraction and improve the performance of
the exercise recognition. Our experimental results demonstrate a notable 11
percent improvement in F1 score for exercise recognition while preserving
label-specific features.
( 2
min )
In the presence of right-censored data with covariates, the conditional
Kaplan-Meier estimator (also known as the Beran estimator) consistently
estimates the conditional survival function of the random follow-up for the
event of interest. However, a necessary condition is the unambiguous knowledge
of whether each individual is censored or not, which may be incomplete in
practice. We therefore propose a study of the Beran estimator when the
censoring indicators are generic random variables and discuss necessary
conditions for the efficiency of the Beran estimator. From this, we provide a
new estimator for the conditional survival function with missing not at random
(MNAR) censoring indicators based on a conditional copula model for the
missingness mechanism. In addition to the theoretical results, we illustrate
how the estimators work for small samples through a simulation study and show
their practical applicability by analyzing synthetic and real data.
( 2
min )
The task of preserving privacy while ensuring efficient communication is a
fundamental challenge in federated learning. In this work, we tackle this
challenge in the trusted aggregator model, and propose a solution that achieves
both objectives simultaneously. We show that employing a quantization scheme
based on subtractive dithering at the clients can effectively replicate the
normal noise addition process at the aggregator. This implies that we can
guarantee the same level of differential privacy against other clients while
substantially reducing the amount of communication required, as opposed to
transmitting full precision gradients and using central noise addition. We also
experimentally demonstrate that the accuracy of our proposed approach matches
that of the full precision gradient method.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
Markov processes are widely used mathematical models for describing dynamic
systems in various fields. However, accurately simulating large-scale systems
at long time scales is computationally expensive due to the short time steps
required for accurate integration. In this paper, we introduce an inference
process that maps complex systems into a simplified representational space and
models large jumps in time. To achieve this, we propose Time-lagged Information
Bottleneck (T-IB), a principled objective rooted in information theory, which
aims to capture relevant temporal features while discarding high-frequency
information to simplify the simulation task and minimize the inference error.
Our experiments demonstrate that T-IB learns information-optimal
representations for accurately modeling the statistical properties and dynamics
of the original process at a selected time lag, outperforming existing
time-lagged dimensionality reduction methods.
( 2
min )
The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and
challenging task that is important in many practical applications. Classical
model-based approaches to this problem require an accurate model to capture how
robot motions affect the deformation of the DLO. Nowadays, data-driven models
offer the best tradeoff between quality and computation time. This paper
analyzes several learning-based 3D models of the DLO and proposes a new one
based on the Transformer architecture that achieves superior accuracy, even on
the DLOs of different lengths, thanks to the proposed scaling method. Moreover,
we introduce a data augmentation technique, which improves the prediction
performance of almost all considered DLO data-driven models. Thanks to this
technique, even a simple Multilayer Perceptron (MLP) achieves close to
state-of-the-art performance while being significantly faster to evaluate. In
the experiments, we compare the performance of the learning-based 3D models of
the DLO on several challenging datasets quantitatively and demonstrate their
applicability in the task of shaping a DLO.
( 2
min )
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE)
architecture using a split vector quantizer for NTTS, as an enhancement to the
well-known Variational Autoencoder (VAE) and Vector Quantized Variational
Autoencoder (VQ-VAE) architectures. Compared to these previous architectures,
our proposed model retains the benefits of using an utterance-level bottleneck,
while keeping significant representation power and a discretized latent space
small enough for efficient prediction from text. We train the model on
recordings in the expressive task-oriented dialogues domain and show that
SVQ-VAE achieves a statistically significant improvement in naturalness over
the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE latent
acoustic space is predictable from text, reducing the gap between the standard
constant vector synthesis and vocoded recordings by 32%.
( 2
min )
Integrating variable renewable energy into the grid has posed challenges to
system operators in achieving optimal trade-offs among energy availability,
cost affordability, and pollution controllability. This paper proposes a
multi-agent reinforcement learning framework for managing energy transactions
in microgrids. The framework addresses the challenges above: it seeks to
optimize the usage of available resources by minimizing the carbon footprint
while benefiting all stakeholders. The proposed architecture consists of three
layers of agents, each pursuing different objectives. The first layer,
comprised of prosumers and consumers, minimizes the total energy cost. The
other two layers control the energy price to decrease the carbon impact while
balancing the consumption and production of both renewable and conventional
energy. This framework also takes into account fluctuations in energy demand
and supply.
( 2
min )
In the present paper we introduce new optimization algorithms for the task of
density ratio estimation. More precisely, we consider extending the well-known
KMM method using the construction of a suitable loss function, in order to
encompass more general situations involving the estimation of density ratio
with respect to subsets of the training data and test data, respectively. The
associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.
( 2
min )
In machine learning models, the estimation of errors is often complex due to
distribution bias, particularly in spatial data such as those found in
environmental studies. We introduce an approach based on the ideas of
importance sampling to obtain an unbiased estimate of the target error. By
taking into account difference between desirable error and available data, our
method reweights errors at each sample point and neutralizes the shift.
Importance sampling technique and kernel density estimation were used for
reweighteing. We validate the effectiveness of our approach using artificial
data that resemble real-world spatial datasets. Our findings demonstrate
advantages of the proposed approach for the estimation of the target error,
offering a solution to a distribution shift problem. Overall error of
predictions dropped from 7% to just 2% and it gets smaller for larger samples.
( 2
min )
Hurricanes present major challenges in the U.S. due to their devastating
impacts. Mitigating these risks is important, and the insurance industry is
central in this effort, using intricate statistical models for risk assessment.
However, these models often neglect key temporal and spatial hurricane patterns
and are limited by data scarcity. This study introduces a refined approach
combining the ARIMA model and K-MEANS to better capture hurricane trends, and
an Autoencoder for enhanced hurricane simulations. Our experiments show that
this hybrid methodology effectively simulate historical hurricane behaviors
while providing detailed projections of potential future trajectories and
intensities. Moreover, by leveraging a comprehensive yet selective dataset, our
simulations enrich the current understanding of hurricane patterns and offer
actionable insights for risk management strategies.
( 2
min )
Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph
structure and text-rich entity/relation information. Text-based KG embeddings
can represent entities by encoding descriptions with pre-trained language
models, but no open-sourced library is specifically designed for KGs with PLMs
at present. In this paper, we present LambdaKG, a library for KGE that equips
with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and
supports various tasks (e.g., knowledge graph completion, question answering,
recommendation, and knowledge probing). LambdaKG is publicly open-sourced at
https://github.com/zjunlp/PromptKG/tree/main/lambdaKG, with a demo video at
this http URL and long-term maintenance.
( 2
min )
Open-ended learning benefits immensely from the use of symbolic methods for
goal representation as they offer ways to structure knowledge for efficient and
transferable learning. However, the existing Hierarchical Reinforcement
Learning (HRL) approaches relying on symbolic reasoning are often limited as
they require a manual goal representation. The challenge in autonomously
discovering a symbolic goal representation is that it must preserve critical
information, such as the environment dynamics. In this work, we propose a
developmental mechanism for subgoal discovery via an emergent representation
that abstracts (i.e., groups together) sets of environment states that have
similar roles in the task. We create a HRL algorithm that gradually learns this
representation along with the policies and evaluate it on navigation tasks to
show the learned representation is interpretable and results in data
efficiency.
( 2
min )
In the presence of heterogeneous data, where randomly rotated objects fall
into multiple underlying categories, it is challenging to simultaneously
classify them into clusters and synchronize them based on pairwise relations.
This gives rise to the joint problem of community detection and
synchronization. We propose a series of semidefinite relaxations, and prove
their exact recovery when extending the celebrated stochastic block model to
this new setting where both rotations and cluster identities are to be
determined. Numerical experiments demonstrate the efficacy of our proposed
algorithms and confirm our theoretical result which indicates a sharp phase
transition for exact recovery.
( 2
min )
We aim to provide a general framework of for computational photography that
recovers the real scene from imperfect images, via the Deep Nonparametric
Convexified Filtering (DNCF). It is consists of a nonparametric deep network to
resemble the physical equations behind the image formation, such as denoising,
super-resolution, inpainting, and flash. DNCF has no parameterization dependent
on training data, therefore has a strong generalization and robustness to
adversarial image manipulation. During inference, we also encourage the network
parameters to be nonnegative and create a bi-convex function on the input and
parameters, and this adapts to second-order optimization algorithms with
insufficient running time, having 10X acceleration over Deep Image Prior. With
these tools, we empirically verify its capability to defend image
classification deep networks against adversary attack algorithms in real-time.
( 2
min )
In the presence of right-censored data with covariates, the conditional
Kaplan-Meier estimator (also known as the Beran estimator) consistently
estimates the conditional survival function of the random follow-up for the
event of interest. However, a necessary condition is the unambiguous knowledge
of whether each individual is censored or not, which may be incomplete in
practice. We therefore propose a study of the Beran estimator when the
censoring indicators are generic random variables and discuss necessary
conditions for the efficiency of the Beran estimator. From this, we provide a
new estimator for the conditional survival function with missing not at random
(MNAR) censoring indicators based on a conditional copula model for the
missingness mechanism. In addition to the theoretical results, we illustrate
how the estimators work for small samples through a simulation study and show
their practical applicability by analyzing synthetic and real data.
( 2
min )
We consider the problem of approximating the regression function from noisy
vector-valued data by an online learning algorithm using an appropriate
reproducing kernel Hilbert space (RKHS) as prior. In an online algorithm,
i.i.d. samples become available one by one by a random process and are
successively processed to build approximations to the regression function. We
are interested in the asymptotic performance of such online approximation
algorithms and show that the expected squared error in the RKHS norm can be
bounded by $C^2 (m+1)^{-s/(2+s)}$, where $m$ is the current number of processed
data, the parameter $0<s\leq 1$ expresses an additional smoothness assumption
on the regression function and the constant $C$ depends on the variance of the
input noise, the smoothness of the regression function and further parameters
of the algorithm.
( 2
min )
Benign overfitting, the phenomenon where interpolating models generalize well
in the presence of noisy data, was first observed in neural network models
trained with gradient descent. To better understand this empirical observation,
we consider the generalization error of two-layer neural networks trained to
interpolation by gradient descent on the logistic loss following random
initialization. We assume the data comes from well-separated class-conditional
log-concave distributions and allow for a constant fraction of the training
labels to be corrupted by an adversary. We show that in this setting, neural
networks exhibit benign overfitting: they can be driven to zero training error,
perfectly fitting any noisy training labels, and simultaneously achieve minimax
optimal test error. In contrast to previous work on benign overfitting that
require linear or kernel-based predictors, our analysis holds in a setting
where both the model and learning dynamics are fundamentally nonlinear.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
Large language model (LLM) agents are programs that extend the capabilities of standalone LLMs with 1) access to external tools (APIs, functions, webhooks, plugins, and so on), and 2) the ability to plan and execute tasks in a self-directed fashion. Often, LLMs need to interact with other software, databases, or APIs to accomplish complex tasks. […]
( 13
min )
With Style2Fab, makers can rapidly customize models of 3D-printable objects, such as assistive devices, without hampering their functionality.
( 10
min )
In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks.
( 10
min )
Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific […]
( 8
min )
This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful. Unstructured data accounts for 80% of all the data found within organizations, consisting of […]
( 14
min )
Modern applications heavily rely on robust network infrastructure, requiring continuous innovation. In this evolving landscape, Microsoft is at the forefront, spearheading innovation efforts in networking and strengthening the foundational network infrastructure that underpins the cloud ecosystem. By investing in and enhancing this critical infrastructure, Microsoft not only ensures the resilience and scalability of cloud services […]
The post Microsoft at ACM SIGCOMM 2023: Innovating the future of networking appeared first on Microsoft Research.
( 10
min )
What’s the driving force behind AI’s recent, rapid progress? Research manager Ahmed Awadallah shares his insights on this, the two-stage approach to training large-scale models, and the need for better model evaluation in this episode of the #MSRPodcast.
The post AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens appeared first on Microsoft Research.
( 31
min )
Working as a data scientist is the dream of many IT professionals these days. It is no secret that data science is a skyrocketing field attracting young professionals and inspiring many to switch careers to data science. On one front are young professionals who study their courses in colleges to pursue their dream of becoming… Read More »Are data science certifications the gateway to competitive pay?
The post Are data science certifications the gateway to competitive pay? appeared first on Data Science Central.
( 19
min )
CUPED: Improve Your A/B Testing - Detect Smaller Gains, Utilise Smaller Samples and Make Smarter Decisions!
The post CUPED for starters: Enhancing controlled experiments with pre-experiment data appeared first on Data Science Central.
( 26
min )
The best way to model business and consumer dynamics is collaboratively, with stakeholders all in the same virtual room contributing. Of course, this has been happening asynchronously for some time now, but the potential exists for more real-time interaction. Modelers don’t work in a vacuum, of course. The iterations between a modeler who develops a… Read More »Collaborative visual knowledge graph modeling at the system level
The post Collaborative visual knowledge graph modeling at the system level appeared first on Data Science Central.
( 20
min )
GFN Thursday is downright demonic, as Devil May Cry 5 comes to GeForce NOW. Capcom’s action-packed third-person brawler leads 15 titles joining the GeForce NOW library this week, including Gears Tactics and The Crew Motorfest. It’s also the last week to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today for a chance Read article >
( 6
min )
The machine-learning method works on most mobile devices and could be expanded to assess other motor disorders outside of the doctor’s office.
( 10
min )
Although computer scientists may initially treat data bias and error as a nuisance, researchers argue it’s a hidden treasure trove for reflecting societal values.
( 10
min )
Researchers use synthetic data to improve a model’s ability to grasp conceptual information, which could enhance automatic captioning and question-answering systems.
( 10
min )
Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content. […]
( 8
min )
In this issue: Efficient polyglot analytics on semantic data aids query performance; generative retrieval for conversational question answering improves dialogue-based interfaces; a new tool uses ML to address capacity degradation in lithium-ion batteries.
The post Research Focus: Week of September 11, 2023 appeared first on Microsoft Research.
( 9
min )
Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Read article >
( 5
min )
We’re growing our presence in Europe with an office in Dublin, Ireland.
( 2
min )
In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy. The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, Read article >
( 6
min )
Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich. NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for Read article >
( 7
min )
Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine. The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software. Yoo’s Read article >
( 6
min )
Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks and demonstrates how NVIDIA Studio technology improves creative workflows. When it comes to converting 2D concepts into 3D masterpieces, self-taught visual development artist Alex Treviño has confidence in the potential of all Read article >
( 7
min )
Businesses today constantly strive to gain a competitive edge in their marketing efforts. Leveraging their data effectively to create data-driven campaigns is the best way to trump the competition. One of the best tools at their disposal to utilize their data is a data warehouse. Data warehousing is crucial in enhancing marketing and campaign management… Read More »Data Warehousing: The key to effective marketing campaign management
The post Data Warehousing: The key to effective marketing campaign management appeared first on Data Science Central.
( 21
min )
The way we work has changed, with remote teams now a common part of the landscape. While remote work offers flexibility, it also brings challenges. Managing remote teams effectively is crucial to ensure productivity and collaboration. In this article, we’ll explore how using time tracking for remote teams can help manage employees’ performance better. Time-tracking… Read More »Data-driven insights: Improving remote team performance with time-tracking analytics
The post Data-driven insights: Improving remote team performance with time-tracking analytics appeared first on Data Science Central.
( 21
min )
In our increasingly interconnected world, the digital realm has become both a frontier of innovation and a battleground of threats. As technology advances, so do the tactics of malicious actors who seek to exploit vulnerabilities in our digital infrastructure. The rapid evolution of cyber threats calls for a paradigm shift in defense strategies, and that’s… Read More »AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies
The post AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies appeared first on Data Science Central.
( 22
min )
This research paper was presented at the 28th ACM SIGPLAN International Conference on Functional Programming (opens in new tab) (ICFP), a premier forum for discussing design, implementations, principles, and uses of functional programming. Functional programming languages offer a host of advantages, such as ensuring memory safety (opens in new tab) and eliminating arbitrary side effects. […]
The post FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs appeared first on Microsoft Research.
( 10
min )
Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes. SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, […]
( 6
min )
This post addresses the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment.
( 6
min )
In this post, we share how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services
( 10
min )
The system could improve image quality in video streaming or help autonomous vehicles identify road hazards in real-time.
( 10
min )
Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.
( 14
min )
Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving […]
( 10
min )
In its debut on the MLPerf industry benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all data center inference tests, extending the leading performance of NVIDIA H100 Tensor Core GPUs. The overall results showed the exceptional performance and versatility of the NVIDIA AI platform from the cloud to the network’s edge. Separately, NVIDIA announced inference Read article >
( 7
min )
“Lightning” system connects photons to the electronic components of computers using a novel abstraction, creating the first photonic computing prototype to serve real-time machine-learning inference requests.
( 9
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )